3.2 Bad Assumptions about Data Types

Next: 3.3 Poorly Sized Malloc Up: 3 Generic 64-bit Portability Previous: 3.1 Listen to Your

3.2 Bad Assumptions about Data Types

So what types of error should concern you? Most of the problems result from assumptions, implicit or explicit, about either the absolute or relative sizes of the int, long, and pointer data types. Here are common faulty assumptions that undermine 64-bit porting:

sizeof(int) == sizeof(void*)
This assumption occurs when a pointer is cast to an int to perform pointer arithmetic. The assumption can also occur when a union is used to hold both an int and a pointer, or when an int or pointer is passed as a parameter to a routine actually requiring the opposite type.
Assuming prototypes are used, then casting and parameter passing problems are detected by good compilers. The union problem is subtler and harder for compilers to detect.
One way to avoid the pointer arithmetic problem is to use the ANSI C ptrdiff_t type found in the stddef.h header file. This integer type is defined to be large enough to hold an integer representation of a pointer and is expressly designed for portably performing pointer arithmetic. For example, instead of:
```
int
ptr_dist(void *a, void *b) {
  return (int) a - (int) b;
}
```
write:
```
#include <stddef.h>
ptrdiff_t
ptr_dist(void *a, void *b) {
  return (ptrdiff_t) a
         - (ptrdiff_t) b;
}
```
sizeof(int) == sizeof(long)
This assumption is similar the previous assumption. In the 32BIT model, these two types are both logically integers and are of the same size, so it is common for a routine to expect, for instance, an int and still work correctly when passed a long. Be particularly careful when mixing signed and unsigned versions of long and int since an unsigned value may be unintentionally sign-extended. Again, proper prototypes catch or avoid most such errors.
sizeof(long) == 4
This assumption manifests itself when a long (often embedded in a structure) is used to map external data representations intended to be 32 bits. This can often happen when reading or writing binary data files or encoding or decoding protocols. The following 32-bit code fragment would not be portable to a 64-bit system for this reason:
```
#include <stdio.h>
/* header struct changes size
   when compiled LP64! */
struct header {
  long tag;
  long length;
};
main(int argc, char **argv) {
  FILE *file;
  struct header info;
  file = fopen("data", "r");
  fread(&info, sizeof(info), 1, file);
}
```
The info variable has a different size on a LP64 system than on a 32BIT system, implying that different data would be read, and the tag and length members would get a different value on each system when reading the same file. Unfortunately, this type of error is not caught by the compiler.
Also, do not assume a long is four bytes when using unions. The following example is not portable:
```
union {
  char c[4];
  long l;
} combo;
main(int argc, char **argv) {
  combo.c[0] = 'c';
  combo.c[1] = 'a';
  combo.c[2] = 't';
  combo.c[3] = '\0';
  /* wrong if sizeof(long) != 4 */
  if(combo.l == 0x63617400)
    printf("Big endian\n");
  else
    printf("Little endian\n");
}
```
This code could be used to determine the byte order of a 32-bit system, but would not work correctly on a 64-bit LP64 system since a long is not four bytes long.
sizeof(void*) == 4
This assumption is analogous to the previous one. Because use of pointers in file and protocol formats is dubious, this assumption is rare.
Assumptions about constants and arithmetic.
Size changes to the integer types can result in unexpected results from arithmetic using constants due to conversions between signed and unsigned types. Be particularly careful when using constants with the high-order bit set. For example:
```
long x, y;
x = 3;
/* 32BIT truncates y to 32 bits,
   LP64 does not */
y = x + 0xffffffff;
```
In the 32BIT model, the result is 2. In the LP64 model, the result is 4,294,967,298. The 0xffffffff constant is treated as an unsigned constant in both cases, but in the 32-bit model, the result is truncated to 32 bits.
Beware of other assumptions about how the results of integer operations are truncated. This fragment demonstrates how different models handle a shift operation:
```
unsigned long a = 0xff000000;
if(a << 8 == 0)
  printf("32BIT, PC, or LLP64");
else 
  printf("LP64 or ILP64");
```
In the LP64 case, the shifted set bits are not truncated as in the 32-bit case. Compilers can warn you about most of these assumptions.
Be aware that you can use the L and U suffixes (lowercase l and u are also valid) to indicate integer constants are long and unsigned respectively. These suffixes can be used in combination to indicate an unsigned long constant.

Next: 3.3 Poorly Sized Malloc Up: 3 Generic 64-bit Portability Previous: 3.1 Listen to Your

Mark Kilgard
Sat Dec 30 11:52:07 PST 1995